Universal Dependencies for Learner English

نویسندگان

  • Yevgeni Berzak
  • Jessica Kenney
  • Carolyn Spadine
  • Jing Xian Wang
  • Lucia Lam
  • Keiko Sophie Mori
  • Sebastian Garza
  • Boris Katz
چکیده

We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research on second language acquisition as well as automatic processing of ungrammatical language1 .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

REALEC learner treebank: annotation principles and evaluation of automatic parsing

The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The original corpus is manually annotated for learners’ errors and gives information on the error span, error type, and the possible correction of the mistake provided b...

متن کامل

Universal Dependencies-based syntactic features in detecting human translation varieties

In this paper, syntactic annotation is used to reveal linguistic properties of translations. We employ the Universal Dependencies framework to represent learner and professional translations of English mass-media texts into Russian (along with non-translated Russian texts of the same genre) with the aim to discover and describe syntactic specificity of translations produced at different levels ...

متن کامل

Towards Universal Dependencies for Learner Chinese

We propose an annotation scheme for learner Chinese in the Universal Dependencies (UD) framework. The schemewas adapted from a UD scheme for Mandarin Chinese to take interlanguage characteristics into account. We applied the scheme to a set of 100 sentenceswritten by learners of Chinese as a foreign language, and we report inter-annotator agreement on syntactic annotation.

متن کامل

Relationship between Iranian EFL High School Students’ Knowledge of Universal Grammar and their Performance on Standardized General English Proficiency Tests

This study investigated the relationship between Iranian high school students’ Universal Grammar knowledge and their performance on such standardized general English proficiency tests as PET and FCE internationally administered by Cambridge University. To this end, 108 students were randomly chosen from some high schools located in Malayer from Hamedan. Since this study was correlational in nat...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.04278  شماره 

صفحات  -

تاریخ انتشار 2016